HBO

Ashley Wright & Mubeena Wahaj

2023-04-13

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 1.0.0 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Warning: package 'shiny' was built under R version 4.2.3
## Warning: package 'kableExtra' was built under R version 4.2.3
## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Lights, camera, action!

Today, we’re going to take a deep dive into the world of HBO movies and TV shows. From the iconic dramas like The Sopranos and Game of Thrones to the latest releases, HBO has been providing quality content to its viewers for decades. But have you ever wondered how they make decisions about what shows to produce or which movies to acquire? That’s where the fascinating world of HBO data comes into play. By analyzing audience trends, ratings, and viewer demographics, HBO can make informed decisions about what to offer to its loyal fans. So sit back, grab some popcorn, and get ready to explore the exciting world of HBO data.

The data we’ve decided to work on is from kaggle, owned by Diego Enrique and here’s the link https://www.kaggle.com/datasets/dgoenrique/hbo-max-movies-and-tv-shows

Let us read our datas, shall we?

## Rows: 64,879
## Columns: 5
## $ person_id <int> 14701, 14702, 14703, 14704, 14705, 14706, 1367, 14716, 14707…
## $ id        <chr> "tm77588", "tm77588", "tm77588", "tm77588", "tm77588", "tm77…
## $ name      <chr> "Humphrey Bogart", "Ingrid Bergman", "Paul Henreid", "Claude…
## $ character <chr> "Rick Blaine", "Ilsa Lund", "Victor Laszlo", "Captain Louis …
## $ role      <chr> "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR…
## Rows: 3,030
## Columns: 15
## $ id                   <chr> "tm77588", "tm155702", "tm83648", "tm3175", "ts22…
## $ title                <chr> "Casablanca", "The Wizard of Oz", "Citizen Kane",…
## $ type                 <chr> "MOVIE", "MOVIE", "MOVIE", "MOVIE", "SHOW", "MOVI…
## $ description          <chr> "In Casablanca, Morocco in December 1941, a cynic…
## $ release_year         <int> 1943, 1939, 1941, 1945, 1940, 1940, 1946, 1934, 1…
## $ age_certification    <chr> "PG", "G", "PG", "", "", "G", "", "", "", "PG-13"…
## $ runtime              <int> 102, 102, 119, 113, 8, 238, 114, 93, 111, 109, 12…
## $ genres               <chr> "['drama', 'romance', 'war']", "['fantasy', 'fami…
## $ production_countries <chr> "['US']", "['US']", "['US']", "['US']", "['US']",…
## $ seasons              <dbl> NA, NA, NA, NA, 16, NA, NA, NA, NA, NA, NA, NA, N…
## $ imdb_id              <chr> "tt0034583", "tt0032138", "tt0033467", "tt0037059…
## $ imdb_score           <dbl> 8.5, 8.1, 8.3, 7.5, 7.7, 8.2, 7.9, 7.9, 7.9, 8.3,…
## $ imdb_votes           <dbl> 577842, 406105, 446627, 25589, 859, 319463, 87289…
## $ tmdb_popularity      <dbl> 22.005, 56.631, 19.900, 8.311, 1.400, 27.535, 11.…
## $ tmdb_score           <dbl> 8.167, 7.583, 8.022, 7.000, 10.000, 8.000, 7.700,…

Whoops! let’s make it a little more readable

here’s our titles.csv
Sample table of credits data
person_id id name character role
14701 tm77588 Humphrey Bogart Rick Blaine ACTOR
14702 tm77588 Ingrid Bergman Ilsa Lund ACTOR
14703 tm77588 Paul Henreid Victor Laszlo ACTOR
14704 tm77588 Claude Rains Captain Louis Renault ACTOR
14705 tm77588 Conrad Veidt Major Heinrich Strasser ACTOR
14706 tm77588 Sydney Greenstreet Signor Ferrari ACTOR
And here’s our titles.csv
Sample table of titles data
id title type description release_year age_certification runtime genres production_countries seasons imdb_id imdb_score imdb_votes tmdb_popularity tmdb_score
tm77588 Casablanca MOVIE In Casablanca, Morocco in December 1941, a cynical American expatriate meets a former lover, with unforeseen complications. 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167
tm155702 The Wizard of Oz MOVIE Young Dorothy finds herself in a magical world where she makes friends with a lion, a scarecrow and a tin man as they make their way along the yellow brick road to talk with the Wizard and ask for the things they miss most in their lives. The Wicked Witch of the West is the only thing that could stop them. 1939 G 102 [‘fantasy’, ‘family’] [‘US’] NA tt0032138 8.1 406105 56.631 7.583
tm83648 Citizen Kane MOVIE Newspaper magnate, Charles Foster Kane is taken from his mother as a boy and made the ward of a rich industrialist. As a result, every well-meaning, tyrannical or self-destructive move he makes for the rest of his life appears in some way to be a reaction to that deeply wounding event. 1941 PG 119 [‘drama’] [‘US’] NA tt0033467 8.3 446627 19.900 8.022
tm3175 Meet Me in St. Louis MOVIE In the year before the 1904 St. Louis World’s Fair, the four Smith daughters learn lessons of life and love, even as they prepare for a reluctant move to New York. 1945 113 [‘drama’, ‘family’, ‘romance’, ‘music’, ‘comedy’] [‘US’] NA tt0037059 7.5 25589 8.311 7.000
ts225761 Tom and Jerry SHOW Tom and Jerry is an American animated franchise and series of comedy short films created in 1940 by William Hanna and Joseph Barbera. Best known for its 161 theatrical short films by Metro-Goldwyn-Mayer, the series centers on a friendship/rivalry (a love-hate relationship) between the title characters Tom, a cat, and Jerry, a mouse. Many shorts also feature several recurring characters. 1940 8 [‘animation’, ‘comedy’, ‘family’, ‘action’] [‘US’] 16 tt6422744 7.7 859 1.400 10.000
tm156463 Gone with the Wind MOVIE The spoiled daughter of a Georgia plantation owner conducts a tumultuous romance with a cynical profiteer during the American Civil War and Reconstruction Era. 1940 G 238 [‘drama’, ‘romance’, ‘war’, ‘history’] [‘US’] NA tt0031381 8.2 319463 27.535 8.000

Firstly let’s see how many movies and TV shows we are dealing with

##    type    n
## 1 MOVIE 2408
## 2  SHOW  622

Wow! that’s a lot more movies than shows! But let’s see it in a graph

And what’s the distribution of genres do we have from both?

## `summarise()` has grouped output by 'genres'. You can override using the
## `.groups` argument.

Looks like the type of genres are hard to read. Let’s flip our coordinates

## `summarise()` has grouped output by 'genres'. You can override using the
## `.groups` argument.

Here are the number of shows available in Netflix as a function of time¶

## `summarise()` has grouped output by 'release_year'. You can override using the
## `.groups` argument.

Now let’s see what are the top 10 most popular movies and show from imbd and tmdb

##                                                title  type release_year
## 1                           The Shawshank Redemption MOVIE         1994
## 2                                    Celebrity Habla MOVIE         2009
## 3                                  Emergency Contact MOVIE         2015
## 4                                    The Dark Knight MOVIE         2008
## 5      The Lord of the Rings: The Return of the King MOVIE         2003
## 6                Euphoria: Trouble Don't Last Always MOVIE         2020
## 7        Juan Luis Guerra 4.40: Entre Mar y Palmeras MOVIE         2021
## 8  The Lord of the Rings: The Fellowship of the Ring MOVIE         2001
## 9              The Lord of the Rings: The Two Towers MOVIE         2002
## 10                                 Celebrity Habla 2 MOVIE         2010
##                                      genres
## 1                                 ['drama']
## 2                         ['documentation']
## 3                                ['comedy']
## 4  ['drama', 'thriller', 'action', 'crime']
## 5            ['fantasy', 'action', 'drama']
## 6                                 ['drama']
## 7                                 ['music']
## 8            ['fantasy', 'action', 'drama']
## 9            ['action', 'fantasy', 'drama']
## 10                        ['documentation']
##                                                title  type release_year
## 1                           The Shawshank Redemption MOVIE         1994
## 2                                    Celebrity Habla MOVIE         2009
## 3                                  Emergency Contact MOVIE         2015
## 4                                    The Dark Knight MOVIE         2008
## 5      The Lord of the Rings: The Return of the King MOVIE         2003
## 6                Euphoria: Trouble Don't Last Always MOVIE         2020
## 7        Juan Luis Guerra 4.40: Entre Mar y Palmeras MOVIE         2021
## 8  The Lord of the Rings: The Fellowship of the Ring MOVIE         2001
## 9              The Lord of the Rings: The Two Towers MOVIE         2002
## 10                                 Celebrity Habla 2 MOVIE         2010
##                                      genres
## 1                                 ['drama']
## 2                         ['documentation']
## 3                                ['comedy']
## 4  ['drama', 'thriller', 'action', 'crime']
## 5            ['fantasy', 'action', 'drama']
## 6                                 ['drama']
## 7                                 ['music']
## 8            ['fantasy', 'action', 'drama']
## 9            ['action', 'fantasy', 'drama']
## 10                        ['documentation']
##                          title type release_year
## 1             Band of Brothers SHOW         2001
## 2                    Chernobyl SHOW         2019
## 3                     The Wire SHOW         2002
## 4            Eyes on the Prize SHOW         1987
## 5                 The Sopranos SHOW         1999
## 6              Game of Thrones SHOW         2011
## 7               Rick and Morty SHOW         2013
## 8                    Homegrown SHOW         2021
## 9               The Last of Us SHOW         2023
## 10 Batman: The Animated Series SHOW         1992
##                                                          genres
## 1                         ['drama', 'war', 'history', 'action']
## 2                              ['drama', 'thriller', 'history']
## 3                                ['drama', 'crime', 'thriller']
## 4                                  ['documentation', 'history']
## 5                                            ['drama', 'crime']
## 6            ['scifi', 'drama', 'action', 'romance', 'fantasy']
## 7                    ['animation', 'scifi', 'action', 'comedy']
## 8                                    ['documentation', 'drama']
## 9            ['drama', 'action', 'horror', 'scifi', 'thriller']
## 10 ['family', 'scifi', 'animation', 'action', 'crime', 'drama']

You can see there is a wide range of movies and tv shows, especially what year they were released. I wonder what the newest and oldest movies are?

##                     title  type release_year     genres
## 1 The Prince of Magicians MOVIE         1901 ['comedy']
##                            title  type release_year                      genres
## 1 Marc Maron: From Bleak to Dark MOVIE         2023 ['comedy', 'documentation']
##          title type release_year                                        genres
## 1 Looney Tunes SHOW         1929 ['comedy', 'family', 'thriller', 'animation']
##            title type release_year
## 1 The Last of Us SHOW         2023
##                                               genres
## 1 ['drama', 'action', 'horror', 'scifi', 'thriller']

I definitely have not seen either of those movies, but everyone should know last of us because of tiktok.

Now I am wondering what is the longest movie?

##                          title  type runtime release_year genres
## 1 An Impossible Balancing Feat MOVIE       1         1902     []
##                    title  type runtime release_year                genres
## 1 Scenes from a Marriage MOVIE     299         1974 ['drama', 'european']
##                title type runtime seasons release_year                  genres
## 1 Meet the Batwheels SHOW       2       1         2022 ['animation', 'action']
##           title type runtime seasons release_year
## 1 Sesame Street SHOW      51      53         1969
##                                                  genres
## 1 ['comedy', 'animation', 'family', 'fantasy', 'music']

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.